Skip to content

Conversation

@ssijaric-nv
Copy link
Contributor

@ssijaric-nv ssijaric-nv commented Dec 4, 2024

The trampoline size is 36 bytes for AArch64, 40 bytes for PPC32 and 48 bytes
for PPC64, defined in compiler-rt/lib/builtins/trampoline_setup.c. During AArch64
and PPC lowering, init.trampoline is lowered to a call to __trampoline_setup, with the
corresponding trampoline sizes passed to it. On x86_64 and RISCV, __trampoline_setup
isn't used, and the setup fits within 32 bytes.

@github-actions
Copy link

github-actions bot commented Dec 4, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@ssijaric-nv ssijaric-nv marked this pull request as ready for review December 5, 2024 00:50
@llvmbot llvmbot added flang Flang issues not falling into any other category flang:fir-hlfir flang:codegen labels Dec 5, 2024
@llvmbot
Copy link
Member

llvmbot commented Dec 5, 2024

@llvm/pr-subscribers-flang-codegen

Author: None (ssijaric-nv)

Changes

The trampoline size is 36 bytes for AArch64, 40 bytes for PPC32 and 48 bytes
for PPC64. During AArch64 and PPC lowering, init.trampoline is lowered
to a call to __trampoline_setup, with the corresponding trampoline sizes
passed to it.


Full diff: https://github.com/llvm/llvm-project/pull/118678.diff

2 Files Affected:

  • (modified) flang/lib/Optimizer/CodeGen/BoxedProcedure.cpp (+9-1)
  • (modified) flang/test/Fir/boxproc.fir (+9-3)
diff --git a/flang/lib/Optimizer/CodeGen/BoxedProcedure.cpp b/flang/lib/Optimizer/CodeGen/BoxedProcedure.cpp
index c536fd19fcc69a..e1c9e97d0b535b 100644
--- a/flang/lib/Optimizer/CodeGen/BoxedProcedure.cpp
+++ b/flang/lib/Optimizer/CodeGen/BoxedProcedure.cpp
@@ -268,10 +268,18 @@ class BoxedProcedurePass
             // Create the thunk.
             auto module = embox->getParentOfType<mlir::ModuleOp>();
             FirOpBuilder builder(rewriter, module);
+            const auto triple{fir::getTargetTriple(module)};
             auto loc = embox.getLoc();
             mlir::Type i8Ty = builder.getI8Type();
             mlir::Type i8Ptr = builder.getRefType(i8Ty);
-            mlir::Type buffTy = SequenceType::get({32}, i8Ty);
+            fir::SequenceType::Extent thunkSize = 32;
+            if (triple.isPPC32())
+              thunkSize = 40;
+            else if (triple.isPPC64())
+              thunkSize = 48;
+            else if (triple.isAArch64())
+              thunkSize = 36;
+            mlir::Type buffTy = SequenceType::get({thunkSize}, i8Ty);
             auto buffer = builder.create<AllocaOp>(loc, buffTy);
             mlir::Value closure =
                 builder.createConvert(loc, i8Ptr, embox.getHost());
diff --git a/flang/test/Fir/boxproc.fir b/flang/test/Fir/boxproc.fir
index 9e4ea0bc210775..fbc1329892cbc7 100644
--- a/flang/test/Fir/boxproc.fir
+++ b/flang/test/Fir/boxproc.fir
@@ -1,7 +1,11 @@
-// RUN: tco %s | FileCheck %s
+// RUN: %if aarch64-registered-target %{tco --target=aarch64-unknown-linux-gnu %s | FileCheck %s --check-prefixes=CHECK,CHECK-AARCH64 %}
+// RUN: %if x86-registered-target %{tco --target=x86_64-unknown-linux-gnu %s | FileCheck %s --check-prefixes=CHECK,CHECK-X86 %}
+// RUN: %if powerpc-registered-target %{tco --target=powerpc64le-unknown-linux-gnu %s | FileCheck %s --check-prefixes=CHECK,CHECK-PPC %}
 
 // CHECK-LABEL: define void @_QPtest_proc_dummy()
-// CHECK:         %[[VAL_3:.*]] = alloca [32 x i8], i64 1, align 1
+// CHECK-AARCH64: %[[VAL_3:.*]] = alloca [36 x i8], i64 1, align 1
+// CHECK-X86:     %[[VAL_3:.*]] = alloca [32 x i8], i64 1, align 1
+// CHECK-PPC:     %[[VAL_3:.*]] = alloca [4{{[0-8]+}} x i8], i64 1, align 1
 // CHECK:         %[[VAL_1:.*]] = alloca { ptr }, i64 1, align 8
 // CHECK:         %[[VAL_0:.*]] = alloca i32, i64 1, align 4
 // CHECK:         %[[VAL_2:.*]] = getelementptr { ptr }, ptr %[[VAL_1]], i32 0, i32 0
@@ -59,7 +63,9 @@ func.func @_QPtest_proc_dummy_other(%arg0: !fir.boxproc<() -> ()>) {
 }
 
 // CHECK-LABEL: define void @_QPtest_proc_dummy_char()
-// CHECK:         %[[VAL_20:.*]] = alloca [32 x i8], i64 1, align 1
+// CHECK-AARCH64: %[[VAL_20:.*]] = alloca [36 x i8], i64 1, align 1
+// CHECK-X86:     %[[VAL_20:.*]] = alloca [32 x i8], i64 1, align 1
+// CHECK-PPC:     %[[VAL_20:.*]] = alloca [4{{[0-8]+}} x i8], i64 1, align 1
 // CHECK:         %[[VAL_2:.*]] = alloca { { ptr, i64 } }, i64 1, align 8
 // CHECK:         %[[VAL_1:.*]] = alloca [10 x i8], i64 1, align 1
 // CHECK:         %[[VAL_0:.*]] = alloca [40 x i8], i64 1, align 1

@llvmbot
Copy link
Member

llvmbot commented Dec 5, 2024

@llvm/pr-subscribers-flang-fir-hlfir

Author: None (ssijaric-nv)

Changes

The trampoline size is 36 bytes for AArch64, 40 bytes for PPC32 and 48 bytes
for PPC64. During AArch64 and PPC lowering, init.trampoline is lowered
to a call to __trampoline_setup, with the corresponding trampoline sizes
passed to it.


Full diff: https://github.com/llvm/llvm-project/pull/118678.diff

2 Files Affected:

  • (modified) flang/lib/Optimizer/CodeGen/BoxedProcedure.cpp (+9-1)
  • (modified) flang/test/Fir/boxproc.fir (+9-3)
diff --git a/flang/lib/Optimizer/CodeGen/BoxedProcedure.cpp b/flang/lib/Optimizer/CodeGen/BoxedProcedure.cpp
index c536fd19fcc69a..e1c9e97d0b535b 100644
--- a/flang/lib/Optimizer/CodeGen/BoxedProcedure.cpp
+++ b/flang/lib/Optimizer/CodeGen/BoxedProcedure.cpp
@@ -268,10 +268,18 @@ class BoxedProcedurePass
             // Create the thunk.
             auto module = embox->getParentOfType<mlir::ModuleOp>();
             FirOpBuilder builder(rewriter, module);
+            const auto triple{fir::getTargetTriple(module)};
             auto loc = embox.getLoc();
             mlir::Type i8Ty = builder.getI8Type();
             mlir::Type i8Ptr = builder.getRefType(i8Ty);
-            mlir::Type buffTy = SequenceType::get({32}, i8Ty);
+            fir::SequenceType::Extent thunkSize = 32;
+            if (triple.isPPC32())
+              thunkSize = 40;
+            else if (triple.isPPC64())
+              thunkSize = 48;
+            else if (triple.isAArch64())
+              thunkSize = 36;
+            mlir::Type buffTy = SequenceType::get({thunkSize}, i8Ty);
             auto buffer = builder.create<AllocaOp>(loc, buffTy);
             mlir::Value closure =
                 builder.createConvert(loc, i8Ptr, embox.getHost());
diff --git a/flang/test/Fir/boxproc.fir b/flang/test/Fir/boxproc.fir
index 9e4ea0bc210775..fbc1329892cbc7 100644
--- a/flang/test/Fir/boxproc.fir
+++ b/flang/test/Fir/boxproc.fir
@@ -1,7 +1,11 @@
-// RUN: tco %s | FileCheck %s
+// RUN: %if aarch64-registered-target %{tco --target=aarch64-unknown-linux-gnu %s | FileCheck %s --check-prefixes=CHECK,CHECK-AARCH64 %}
+// RUN: %if x86-registered-target %{tco --target=x86_64-unknown-linux-gnu %s | FileCheck %s --check-prefixes=CHECK,CHECK-X86 %}
+// RUN: %if powerpc-registered-target %{tco --target=powerpc64le-unknown-linux-gnu %s | FileCheck %s --check-prefixes=CHECK,CHECK-PPC %}
 
 // CHECK-LABEL: define void @_QPtest_proc_dummy()
-// CHECK:         %[[VAL_3:.*]] = alloca [32 x i8], i64 1, align 1
+// CHECK-AARCH64: %[[VAL_3:.*]] = alloca [36 x i8], i64 1, align 1
+// CHECK-X86:     %[[VAL_3:.*]] = alloca [32 x i8], i64 1, align 1
+// CHECK-PPC:     %[[VAL_3:.*]] = alloca [4{{[0-8]+}} x i8], i64 1, align 1
 // CHECK:         %[[VAL_1:.*]] = alloca { ptr }, i64 1, align 8
 // CHECK:         %[[VAL_0:.*]] = alloca i32, i64 1, align 4
 // CHECK:         %[[VAL_2:.*]] = getelementptr { ptr }, ptr %[[VAL_1]], i32 0, i32 0
@@ -59,7 +63,9 @@ func.func @_QPtest_proc_dummy_other(%arg0: !fir.boxproc<() -> ()>) {
 }
 
 // CHECK-LABEL: define void @_QPtest_proc_dummy_char()
-// CHECK:         %[[VAL_20:.*]] = alloca [32 x i8], i64 1, align 1
+// CHECK-AARCH64: %[[VAL_20:.*]] = alloca [36 x i8], i64 1, align 1
+// CHECK-X86:     %[[VAL_20:.*]] = alloca [32 x i8], i64 1, align 1
+// CHECK-PPC:     %[[VAL_20:.*]] = alloca [4{{[0-8]+}} x i8], i64 1, align 1
 // CHECK:         %[[VAL_2:.*]] = alloca { { ptr, i64 } }, i64 1, align 8
 // CHECK:         %[[VAL_1:.*]] = alloca [10 x i8], i64 1, align 1
 // CHECK:         %[[VAL_0:.*]] = alloca [40 x i8], i64 1, align 1

@ssijaric-nv
Copy link
Contributor Author

ssijaric-nv commented Dec 5, 2024

There is a separate PR under #118234 to pass the correct size to __trampoline_setup during AArch64 lowering.

Copy link
Contributor

@jeanPerier jeanPerier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@tblah
Copy link
Contributor

tblah commented Dec 5, 2024

Please could you provide links to the documentation you used, preferably both in code comments and the commit summary

@Leporacanthicus
Copy link
Contributor

Leporacanthicus commented Dec 5, 2024

Not a problem for this patch, but it would be REALLY nice if there was a function that could be called in LLVM to get the correct size for a trampoline, rather than each different user of LLVM having to implement their own variant of

if processorX size = A; 
else if processorY size = B; ...

Subject to providing some documentation for "why these values" (I understand what they represent and sort of why they need to be different, but it'd be good to have a source for what they are on different architectures).

@ssijaric-nv
Copy link
Contributor Author

Thanks, all. Added a comment. I'll add a function to LLVM to retrieve the trampoline size, and will then push a follow-on patch to replace the if...then...else checks.

@kkwli
Copy link
Collaborator

kkwli commented Dec 10, 2024

My colleagues found that the size is 32 for ppc64le, 12 for powerpc-*-aix* and 24 for powerpc64-*-aix*.
@RolandF77 @mandlebug

@ssijaric-nv
Copy link
Contributor Author

My colleagues found that the size is 32 for ppc64le, 12 for powerpc-*-aix* and 24 for powerpc64-*-aix*. @RolandF77 @mandlebug

Thank you for checking, Kelvin. I went by what's in PPCTargetLowering::LowerINIT_TRAMPOLINE, but these are only for Linux. Looks like it's safe to keep it at 32-bytes for all PPC targets, with the exception of ppc32 at 40 bytes?

@kkwli
Copy link
Collaborator

kkwli commented Dec 10, 2024

My colleagues found that the size is 32 for ppc64le, 12 for powerpc-*-aix* and 24 for powerpc64-*-aix*. @RolandF77 @mandlebug

Thank you for checking, Kelvin. I went by what's in PPCTargetLowering::LowerINIT_TRAMPOLINE, but these are only for Linux. Looks like it's safe to keep it at 32-bytes for all PPC targets, with the exception of ppc32 at 40 bytes?

I think it works for Linux. Thanks.

Copy link
Contributor

@tblah tblah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@ssijaric-nv ssijaric-nv force-pushed the flang_trampoline_size branch from 5e5b89e to 273c5ed Compare January 22, 2025 04:17
The trampoline size is 36 bytes for AArch64, 40 bytes for PPC32 and 48 bytes
for PPC64.  During AArch64 and PPC lowering, init.trampoline is lowered
to a call to __trampoline_setup, with the corresponding trampoline sizes
passed to it.
@ssijaric-nv ssijaric-nv force-pushed the flang_trampoline_size branch from 273c5ed to 2191d54 Compare January 22, 2025 19:09
@ssijaric-nv ssijaric-nv merged commit 16e9601 into llvm:main Jan 27, 2025
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

flang:codegen flang:fir-hlfir flang Flang issues not falling into any other category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants